Cross Language Text Classification via Subspace Co-regularized Multi-view Learning
نویسندگان
چکیده
In many multilingual text classification problems, the documents in different languages often share the same set of categories. To reduce the labeling cost of training a classification model for each individual language, it is important to transfer the label knowledge gained from one language to another language by conducting cross language classification. In this paper we develop a novel subspace co-regularized multi-view learning method for cross language text classification. This method is built on parallel corpora produced by machine translation. It jointly minimizes the training error of each classifier in each language while penalizing the distance between the subspace representations of parallel documents. Our empirical study on a large set of cross language text classification tasks shows the proposed method consistently outperforms a number of inductive methods, domain adaptation methods, and multi-view learning methods.
منابع مشابه
Cross Language Text Classification via Multi-view Subspace Learning
Cross language classification is an important task in multilingual learning, aiming for reducing the labeling cost of training a different classification model for each individual language. In this paper we develop a novel subspace co-regularized multi-view learning method for cross language text classification. The empirical study on a set of cross language text classification tasks shows the ...
متن کاملA Subspace Learning Framework for Cross-Lingual Sentiment Classification with Partial Parallel Data
Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a label-scarce target language by exploiting labeled data from a label-rich language. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and that of the target language data. To addres...
متن کاملA Unifying Framework in Vector-valued Reproducing Kernel Hilbert Spaces for Manifold Regularization and Co-Regularized Multi-view Learning
This paper presents a general vector-valued reproducing kernel Hilbert spaces (RKHS) framework for the problem of learning an unknown functional dependency between a structured input space and a structured output space. Our formulation encompasses both Vector-valued Manifold Regularization and Co-regularized Multi-view Learning, providing in particular a unifying framework linking these two imp...
متن کاملSemi-supervised Subspace Co-Projection for Multi-class Heterogeneous Domain Adaptation
Heterogeneous domain adaptation aims to exploit labeled training data from a source domain for learning prediction models in a target domain under the condition that the two domains have different input feature representation spaces. In this paper, we propose a novel semi-supervised subspace co-projection method to address multiclass heterogeneous domain adaptation. The proposed method projects...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کامل